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ABSTRACT 



Problems in indexing library and information science 
literature occur because of the speed of introduction of new terms, 
the nature of class headings, and the uncertain terminology of the 
field. Vocabulary control requires control over the concepts selected 
(the depth of indexing), the form of expression of concepts and the 
syndetic apparatus of the index. The context in which vocabulary 
terms appear, subject and aspect, subject and class entry, other 
types of entries (author, title, series, etc.) , depth of indexing, 
citation and keyword indexing, centralized and decentralized 
indexing, subject lists and thesauri, subject heading and 
classification are discussed. Indexing research ignores the codified 
record of past indexing experience including that of library subject 
heading work, which is the most carefully codified and tested, 
because of its use for a relati7ely shallow form of indexing. Ten 
general guidelx.^xes for planning indexing services for the literature 
are formulated, and aid to "Library Literature” for exploring new 
production methods and research, and expansion in staff, depth and 
scope, to produce a model index is proposed. (Related documents are 
LI 002 796 - 800 and LI 002 802 - 002 807). (AB) 
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IKTRODUCTION 



In a sense, the title of this paper may be nisleadinp;. 
Fxcept at the level of decisions about particular individual 
teiTiis, there are no vocabulary control problems of indexing 
which are peculiar to the literature of librarianship ard 
information science. 

If librarianship were to be classified in accordance 
with the resemblance of its indexing vocabul.ary control prob- 
lems to those of other disciplines, it quite clearly belongs 
to the soft or social sciences rather than to technology^ or 
to the hard or exact sciences. 

The terminology of librarianship and information sci- 
ence is much more imprecise and shifting tlian that of sci- 
ence-technology. The concepts selected for indexing are, 
like those in the social sciences, likely to include a higher 
proportion of titles of vrorks and names of persons and insti- 
tutions tlian would be the case for the exact sciences, and 
a lower proportion of names of substances, procedures, and 
devices . 

But for the purposes of this discussion, even this re- 
semblance does not mean too much. While the proportions may 
differ, all types of vocabulary control problems occur in 
indexing the literature of librarianship. The field itself 
appears to be becoming, rather slowly and painfully, soitewhat 
more of a science, at least in its more technical aspects. 

By ^ its nature as a service profession, however, librarian- 
ship will inevitably remain social-science oriented as well. 
The field sinply cornbines the vocabulary control problems to 
be found in almost all other subject areas or disciplines. 

Further, the question of vocabulary control cannot, at 
least at this point in the development of the art, be pro- 
fitablir considered in any pristine isolation. The question 
of vocabulary control, in the broad sense, is the key ques- 
tion of subject indexing. The question of subject indexing 
is in turn the basic question of infomation science. 

The intent of this paper, then, is not to provide, an- 
swers to the question of vocabulary control in the indexing 
of our literature, but to indicate what some of the issues 
are, to try to clear up some current misunderstandings, to 
advance some tentative conclusions, and to suggest suitable 
areas for further exploration and research. 
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BASIC mmiTs 

Vccatulary control rray be said to involve tlrree basic 
elements vjhi.ch it will be useful to keep particularly in 
mind as this discussion proceeds. These elements are: 

1) Control over what concepts are to be selected, or 
definition of the scope of what constitutes indexable matter, 
and 

2) Control over the form of expression of these concepts 
in the resulting index, together with 

3) Control of the syndetic or cross-refei^ncing apparatus 
of the index, together T-jith appropriate scope notes and reverse 
cross-ref eronces arid other appropriate indications of rela- 
tionships among indexing terms. 

It is the second of these elements, control over the 
form of expression of the concepts, which is most often as- 
sumed to be the topic v?hen vocciiulary control is discussed in 
the literature; yet these three are interdependent elements 
which cannot really be separated. 



CCTER iriFLUEMCES 

These elements are, or at least should be, influenced in 
turn by such factors as the size, physical form, cind probable 
uses of the index in question. Only a small percentage of 
the problems of vocabulary control can profitably be decided 
ucon in isolation from these factors. 



CONTEXT 

Among factors of this kind, one of the most important 
is often, it seems to me, ignored in discussions of vocabulary 
control in indexing, and sliould ceirtainly be kept in mind here. 
This is the matter of context which is to appear with or under 
the indexing vocabulary terms which we employ. 

Whether this context includes sane things which sometimes 
are and sometimes are not considered as part of the indexing 
vocabulaiy itself, such as modifiers, subject subdivisions, 
or form divisions, is of obvious importance. It is important, 
too, in planning vocabulary control to know whether the refer- 
ences in the index appearing vr.der the vocabulary terms are to 
be some form of reference number or locator which in itself 
conveys no meaning to the index user, or whether the reference 
is to be, say, a relatively full bibliographic reference in 
which the information about the author, title, and so on serve 
as further indexing modifiers or discriminators among the refer- 
ences listed xinder the particular expression of the indexing 
concept vhich is our entry or access point. 
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SUBJECT ASPECT 

A few other basic distinctions may assist us an our way. 

I do not intend to delude you with a proup of specialized de- 
finitions, but for the purposes of this pater I would like to 
define a subject as the expression in words of the topic refer- 
red to, the text concept we are indexinff. Only the subject 
proper is considered to fall under this definition, not the 
aspect of that subject which is treated in our tejrt. For ocam- 
ple, an ardricle about the history of i.ndexins is an article 
about the subject Indexinfc from its historical aspect, not about 
history'’. .An article about circulation control in school libraries 
i.s an article about circulation con’trol, and the aspect treated 
is that of this subject in a t)art.lcular t 5 ?j)e of library. This 
distinction is easy to see in theoretical expression: it be- 
comes a more difficult problem in practical indexing in seme 
situations . 

P3.ease note that the fact that I have made tliis distinc- 
tion does not mean to imply precluding entry under aspect 
rather than subje t proper, hut only that it is useful in 
indexing to know which is which, where possible, and to have 
an established policy for dealiiig with entry under subject or 
aspect. 



SUEUECT A® CLASS EJ'jTPY 

A second distinction which will be useful later in this 
■paper is one whi.ch I would like to make between subject and 
class entry. (Notice, please, the care with wliich I am avoid- 
ing the problem x^iord specific . ) Subject entry is entry directly 
under the subject represented in the text, or unde: a synonym 
or preferable form of expression of that subject. Class en’triec 
are those for which the iuidexer translates such an expression 
of the te^rt siibjcct into a lary^er containing class name - in 
which he decides to enter an article about the Enoch Pratt 
Free Library under Public Libraries, or even just Libraries. 

Please notice that the use of class entry does not iiiply 
a classif5.ed arrangement of the resulting entries. The choice 
of entry procedure described does not necessarily imply faceted 
entries or chain indexing or alphabetico-classed entries like: 
Libraries - Public Libraries - United States - Baltimore, 
Maryland - Enoch Pratt Free Library. 

^Also, not .ice please, that I am not unaware of the philo- 
sophical truth that naming seething actually constitutes clas- 
sifying it, and that this is true even when the nam.ed class - a 
particular person or institution, say - has only a single member. 
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This is a concept which is not useful for our current purposes, 
whereas the distinction het>'7een subject as opposed to class 
entry is both useful and fa?_rly coii'inon in 5_ndexing. 



OIT-IER TYPFS OF EMIPIFS 

Subjects - that is, the things or concepts discussed in 
the literature - are not, of course, the only desirable or use- 
ful indexing access poinxs. In addition to class, or form, or 
geographical, or time groupings (for example, entry by such 
things as literary or physical form of the work, or geopgrapliical 
areas or time periods discussed) and similar entries, we may 
have entry by various other handles: by author, by title, by 
series, by sponsoring institution, and so on. 

Expression of the first type of entries, once they ha’.’e 
been selected fior inde>dng, follows that for subject entries. 

In the case of the latter t^.pe, while we my feel that their 
choice is easy for the inde^cer, in practice there are many pro- 
blems: cases, for instance, of multiple authorship, or of offi- 
cial subdivisions of larger corpor’ate bodies where there is 
also a named individual author, and so on. 

We may note briefly that, contrary to popular belief, 
these problenis do not disappear wb.en we can make multiple entry 
as opposed to being limited to a single main entry for them. 
Excessive multiple entry is not simpiv' a problem of economy, 
it adds to the ccmplexite/ of structure of the index for the 
user as well, and hence to ether problems of vocabulary control. 



FORM OF EXPRESSION OF THESE E>rTRIES 

For such forms of entry as author, title, and so on, we 
need not onl-'/ to know what to cheese as indexable mtter , but 
also how to regularize the form of expression of what we have 
chosen, a means of vocabulary control. Notice partj.cularly 
that all of these forms of non-eubdect access w?iiich are of 
particular importance in indexing the literature of a disci- 
pline may also appear in the literature - and do frequently 
appear in the literature - as subjects as well. For certain 
types of material, they may constitute the. majority of index- 
able matter. 
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REGULARIZIMG T!5:SE TYPES OF E^7^RIES 

Heisularizinp; the form of expression of such entries is 
by no ireans a neyliyible probleni, as librarianly discussion 
over rules for catalog entr^r shoire in theory, and the difficul- 
ties posed in the- Govemrent-Wide Index in interfiling entries 
of this kind from different sources made under different riales 
vividly illustrate in practice. 

l:?it!i these complexities themselves I do not intend to 
deal, partly through cowardice and partly because the complex- 
ities are well l-r.CT*in and have been discussed 5.n detail for 
some generations - at least since Panizzi and probably since 
lY’itheim. I remind you of them because it is the tendency of 
librarians to forget and of information scientists not to 
realize t}\at these are not simole matters. It is not even 
just a question of hov^ to identify' author or title or series 
entries and how then to exrr^ss them, but also the problem of 
whether such entries are or are not useful in a particular index 
or indexing situation - when, for example is a title non- 
distinctive? 



MODIFICATIONS, SUBDIVISIONS, AND REFERENCES 

I have briefly mentioned that indexes often include modi- 
fiers or subdivisions of topics, wliether or not the access 
points are concents in the text or not, and whether or not 
their expression is as siobjects or as classes. Subdivisions 
are usually in some kind of regularized form, and a subdivision 
may include more than one reference. Modifications, on the 
other hand, are intended to individuate for each reference 
the aspect of the subject -treated, and are only formalized to 
a limited extent, such as that of placing the most important 
word of the modifier at its beginning, except for prepositions 
or conjunctions. 

The nature of the referei-ice itse3.f may also constitute a 
form of modifier or discriminato'ry context for the index user. 
We are all. aware that some tools for vocabulary control - 
notably, for example, the Sears list^ and the Library of 
Congress headings^ - explicitly include .an elaborate frame- 
work for entry^ subdivisions, both -fcheir nature and their form 
of expression. 

Inplicitly, these particular lists also include much more. 
They assume, in their s-tructure and design, that a particular 
context and a particular form of expression - the unit card 
with all its -tracings - will appear under the heading and its 




6 



> 



Vocabulary - 6 



subdivisions. I would like to point out that this actually 
constT.tutes a further e>rtension of both topic subdivision and 
its egression, and that it is inherently part of the struc- 
ture of these particular tools. The fom of expression of 
ell of these elements, aiid their order, constitute, subject 
to various exceptions requiring individual judp 7 r.ent or inter- 
pretation, the factors deternining the arrangement of the 
entries and the structure of the resulting catalog. 



DEFIMTIION 0^ INDEXi^BLE MATTER 

We will further explore the use of these library vocabu- 
lary control listings later, but one other aspect of theiri 
should be noted at this point. It is assumed that they will, 
be used to regiibrize subjects chosen by a particular definition 
of the scope of indexable matter in the universe of material 
to be indexed, in. this case, books to be subject headed. Thie 
definition of ’the scope of indexable matter for this pjorpose 
we probably owe to Cutter,^ and Kaiser^ commented on it in the 
same context of discussing the indexing of the literature of 
particular topics which concerns us today. 

Wiat is indexable matter in assigning a subject heading 
to a book is the subject or subjects of the hook taken as a 
whole entity, not those subjects which may be discussed at 
various points within it. A book which is alx)ut library 
cataloging may contain the best information in the I5.brary 
about subject headings, but by the library definition subject 
headings as treated in this book do not constitute indexable 
matter. 

The principle, of course, may be and is extended to 
smaller units than books - to journal articles, reports, book 
chapters, and so on - but it is always applied to the biblio- 
graphic unit selected as a whole, and not to the infomation 
contained within that unit. 

I nay be unduly belaboring something which is already 
quite obvi.ous to an audience which consists mainly of librarians, 
but in the broader field of indexing misunderstanding of this 
point appears to me to have caused both confusion in developing 
indexing systems and a significant failure to take maximum 
advantage of library experience in vocabulary control. Let me, 
even in a paper v^ich already threatens to became overlong, 
develop this point. It is important to what is to follow. 
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DEPTH OP INDEXING 



LibrBty catalo^rjjip practice, especially in relation to 
vocabulary control, has not really been very seriously con- 
sidered by modem indexing theorists and those in information 
science , although it seen s to me that it has much to contribute . 
This may be because the library' concept of indexable matter 
leads to "shallow" rather than "deep" indexirg. Although 
there is no really satisfactory method of counting what con- 
stitutes an "entr},'" in an index on a uniform basis, there 
sometimes seems to me to be a further mistaken idea held by 
those who make this criticism that a multiplicity of index 
access points, regardless of their nature, ma]<es an index "deep" 
and therefore somehow good. This is not the point I wish to make. 

Library subject cataloging does indeed have a concept of 
indexable matter - the subject or subjects of the bibliographic 
unit as a whole - x\'hi.ch produces fe^-j entries indeed as compared 
with the number required for intensive indexing of the litera- 
ture of a discipline. We need only to think, for example, of 
the subject indexes to Chemical Abstracts, where new chemical 
information constitutes what concepts must be chosen for 
indexing^ to see that this is the case. 



FOPH OF UCPPESSION 

Defining indexable matter, however, as we noted at the 
beginning of this paper, constitutes only the first of several 
aspects of vocabulary control, the selection of concepts or 
things for indexing. The second, as you will remember, is the 
control then of the form of expression of that indexable matter, 
the area usually thought of as ■'/ocabulary control proper as 
exemplified by the use of heading lists or thesauri. 

It is in this area that the library experience has been 
very .great indeed. Building upon Cutter'^ all too briefly 
expressed basic outline, vie have now nearly a hundred years 
of experience in this particular art , codifi ^d in innumerable 
lists, with learped commentary by such experts as Haykin,'^ 
Metcalf.® Frick,- and, amonfr mv own immediate colleagues, by 
Tauber, ^"Frarey,^^and Lilley.^^ The question of depth of 
indexing shoulci not be alla-ied to prevent proper use of this 
experience in meeting indexing problems. 
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THE IDF/'L INDEX 



An ideal index T>70uld have its scope clearly defined in 
a number of ways. 11". e interested user would be able to deter- 
mine V7ba.t sources vjere covered, what concepts or types of con- 
cepts had been selected as indexable matter, what the form of 
expression of the concepts would be, hot*; this expression would 
be modified or subdivided, precisely what context would accompary 
the resultinp: entries, and exactly what the arranfremi..nt of the 
entries was. .All of these things would be definable in terms 
so rigorous that, barring minor c].erical slips, one indexer 
using the sam,e criter:.a should be able exactly to rep].icate 
what another indexer had created as an index to the same 
material. Insofar as this v;ere possible, we rai^ht be able to 
call indexing a science rather than an art. 

In point of fact, as a number of studies show, conventional 
indexing falls far short of this ideal, even wlien the test is 
to have the same indexer re-index the same material under the 
sam.e conditions but after a lapse of time. A significant num- 
ber of the differences which arise seem to be due to problems 
of vocabi^ary control. At least one study, that by Dr. Ann 
Painter^ would indicate that librariaris applying library-style 
cataloging rules are somewliat more consistent in the entries 
they produce than ere other types of indexers. 

The extent to which this is due to the definition of 
indexable matter, or tc the types of controls of form of ex- 
pression involved is not discussed as such in her study, birt 
it might be supposed that both have a role to play. 



EXISTING INDEXES MEETirJG "IDEAL” CRITERIA 

There are, of co’jr^e, existing indexes which come close, 
very close indeed, to realizing most of the criteria of exact 
definition which I have given as necessar;- for truly scientific 
indexing, and for which a second indexing would produce an 
index almost if not exactly indentical with that produced the 
first time. 

The indexes which approach this ideal of rigor in the defi- 
nition of the procedures follc«»red are indexes of the keyword 
in-or-out-of-context type, or indexes following the citation 
indexing principle. 

As a means for indexing the literature of a discipline 
so as to permit reasonable retrospective search, these' tools 
turn out to be rather mediocre indexes if you add one additional 
criterion to the above; that the index should also offer, vdthin 



9 



Vocabulary - 9 



a reasonably usable compass, at least as pood access to the 
material indexed as those indexes which we subjectively recog- 
nize as "good" manual indexes. ^ 

In many ways, of course, the actual examples of these 
methods of indexing are. neither as picre nor as simple as they 
appear to be on first acquaintance. 



A citation index restricts the concept of indexable mat- 
ter very narrowly indeed, to the works cited by the works 
indexed. In practice, however, repetitions of citations given 
in a single article are suppressed, and judicious elimination 
of some citations not (by subjective judgment) of value to 
the index user might both lower the bullc of the resulting in- 
dex and improve its usefulness. Citations must be regularized 
in form to permit their arrangement and merging in a citation 
index, forms of author names must be determined, and even these 
steps are not simple, nor readily to be done vjithout the exer- 
cise of human judgment on each ind-ividual item, or without 
what amounts to vocabulary control lists of acceptable abbre- 
viations and journal names. 



There have been a number of studies , perhaps the best- 
known of which is that by .Montgomery and S^janson, which when 
read superficially appear to indicate that keywords chosen 
from titles correspond closely to human indexing of the same 
material. The ^fontgomerj^ and Swanson^ ^ study found that 85. R% 
of titles in the Index Medicus contained either the index term 
used or its "synonym." A replication of the study done at 
Columbia'^^ indicates that it included as synonyms many ver^^ 
broad classes to which the index term belonged and vice-versa, 
and tliat if synonyms were more conventionally defined only 
fifty-odd percent of the titles contained the indexing temi 



other studies of the Scunc lucui J.C&& uj. lcii oxlcu. 

In addition, of course, synonymy is one of the major problems 
au,sing scatter in indexing, which vocabulary control systems 
are intended to minimize. 

None of the studies knot>Jn to me which compare title key- 
words v;ith more conventional subject indexing employing voca- 
bulary control consider the problem of subarrargement of entries 
under ke^wrds, although there has been considerable research 
by Tdr.:ejiBand by Kollin^" in seeking rigorous means of producing 



CITATION INDEXES 



TITLE KEYIvOPJD INDEXES 
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a reasonable subarrannement within ke^^word-from-title indexes. 
Kollin has also sought to deal with synonymy. 

For practically all keyword-in-context indexes, the sub- 
arrangment is accidental, in that it is based on the word 
following the keyword itself. This at least means that if 
the concept is expressed by a multiple-VTord term, such terms 
fall together in the index. Not even this much is true of 
keyword-out-of-context indexing . 

For machine p\irpcses , most existing keyword-cut-of-context 
indexes are subarranged by an accession or other number not 
logically useful to the user. This is not a necessary res- 
triction, of course, but by definition indexes of this IcLnd 
do not provide subject subdivision or modifiers, or even sub- 
arrangement by author or title. 

It is certainly true that v;e still lack anything resem- 
bling rigorous means of evaluating indexes or indexing methods, 
despite the substantial contributions to the literature since 
the Cleverdon studies. It does seem to me, however, that 
studies indicating inconsistency^ in human indexing are not by 
any means al^ adequate argument for abandoning attempts to se- 
cure consistency in entry expression, nor an adequate argument 
for achieving consistency in the choice of index^le matter 
by restricting that droice to words - not even concepts - which 
happen to appear in titles. 

For the moment, it seems to me that the most convincing 
arguments against title keyniTord indexing as the means for pro- 
viding an index for retrospective searching of the literature 
of a discipline are subjective and circumstantial. They are 
nonetheless quite convincing, as I think anyone who tries to 
consult cumulations of B.A.S.I.C. , the index to Biological 
Abstracts, will find if he tries to check such subjects as 
Blood, or Rat, or Rats. Since there is even less correspon- 
dence between keywords from titles and concept subject indexing 
in the social sciences than in the physical sciences or life 
sciences, the method certainly does not seem promising for 
indexing of the literature of librarianship. 



ERIC 



Real indexes based on keywords taken from titles are often 
enriched (added to) where the title is not expressive, ha.ve 
forms of words in titles altered on input by human beings, 
hyphenate where Webster would ixst in order to make subjects 
expressed in more than one word arrange as subjects rather 
than as isolated words, and involve other deviations frcm the 
pure definition of their scope and ejjpression. It is probably 
quite safe to say that these changes made to improve the form 
of expression of the subject are generally in the direction of 
similar decisions earlier made by those compiling manual indexes 
and catalogs, and recorded in such tools as the library sub- 
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ject heading lists. Such deviations reauire human iudpjnent 
and add to the cost as Xi/ell as the quality of the resulting 
index. I know of no studies of amount of alteration in exis- 
ting indexes of this kind, or of the add-on cost of various 
amoi.Tnts of change. 

Essentially, too, insofar as they remain rigorously defined, 
title keyword indexes shift off onto author and/or editor both 
the choice of indexable matter and the form of its expression. 
Even when e authors, as was the case with at least one set 
of the ptv jdings of the American Documentation Institute 
(now the A..erican Societx; for Information Science), are v/amed 
in advance that this type of indexing will be done, are infor- 
mation specialists or scientists themselves, and are highlv 
motivated to brin^ their papers to the attention of their col- 
leagues through the index, the results are neither suitable 
for use without considerable post-editing, nor very satisfac- 
tory even fo:_’ a small index even x<?hen this post-editing has 
been done - at least, as coirpared with our subjectively "good" 
manual index with an;^hing approaching the sane number of 
access points. For indexes of this kind we still retain for 
an indexer, too, the problems of regularization of author 
names, titles, and so forth. 



STICH^JODT ENTRY A)T) POTATIONAL BDEXING 

^We learn from history', Spengler tells us, that man learns 
nothing frcm histo^. Stichwort indexing - the idea of get- 
ting a kind of subject indexing by entering under the most im- 
port^t subject word, in the title, and inverting the title to 
provide the necessary context - is a very old idea,- going 
back at least to the late 15th century', which is still, practiced 
today. Indeed, it is often practiced in the indexes issued by 
verv' learned scientific journals whose pages urge improvement 
in indexing technioues for the scientific literature. A great 
deal of the^indejcing in Poole^^ x«?as essentially Stichwort. 
Qpestadorx)^ ex/en suggested a technique he called rotational 
indexing - or making Stichwort entry/ on all of the (manually 
determined) "important" x<iords in the title. Such techniques 
were rx5t successful, primarily for lack of adequate vocabulary 
control. 



AUTHOR AND EDITOR INDEXING 

It would, of course, be nice if authors and editors sup- 
plied titles with everything in them regularized and which 
expressed what the text was about, even though this vrould 
still not supply useful indexing handles for all of the index- 
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able matter which mif^ht be required and economic for a particu- 
lar index. But the idea is less enchantinp, than it at first 
appears, at least for other than use in relatively simll list- 
inp:s intended for current awareness purposes . 

To maJce such a system a success, vre would be confronted 
with the problem, of teaching all authors, or at least all edi- 
tors, how to be indexers. It is hard enough just to teach 
indexers this , despite the fact that they are quite properly 
more strongly motivated toi-?ard indexing than we have any right 
to expect authors or editors to be, and know more about the 
structure and possible uses of the particular index into which 
their entries must fit. For that matter, they also know that 
theii's is a particular index w’ith a particular set of users. 

As we have seen, keyword- from-t it le indexes, hov;ever regular- 
ized, cumulate badly, and are capable of growing only to a 
certain size without becoming unmanageable to use for lack of 
meaningful subarrangement. This is a matter of vocabulary 
control in the provision of subheadings or modifiers with a 
role in determining overall, index structure as well as a voca- 
bulary control problem in subject expression. 



THESAUBI FOP OP IDITOP. USE 

Related to the idea of having editors or authors provide 
"proper" titles for keyword indexing is another currently popu- 
lar suggestion. This is that v?e provide editors, or authors, 
or somebody, vri_th. lists of regularized words or terms, or 
thesauri, or subject heading lists, and have decentralized pro- 
duction of indexing terms vhich will appear with the article 
or report and may subsequently be centrally filed, or filed by 
the user, thus producing instant indexes. 

While a number of journals, particularly engineering 
journals, have begun to include entries from a list of this kind 
with the articles when they are published, I am so far aware of 
only one which uses these entries in the index published by the 
'xmal itself. At least one publisher of technical journals 
has included the entries with articles in some of its journals, 
but does not use them either in the indexes published for the 
journals or in the extensive in-house indexing of its journals. 
As far as I know, no index covering a group of journals uses 
the index entries thus produced, and I know of no special lib- 
raries using them. The approach does not seem to have caught 
on, and this seems likely to be for a combination of reasons 
involving vocabulary/ control: lack of definition of indexable 
matter in using lists, possible unsuitabiity of the lists, and 
lack cf an indexing structure into which entries can be fitted. 
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Decentralized indexine today, then, does not seem to prove 
notaHy more successful - perhaps not as successful - than de- 
centralized indexing efforts in the past, of which Poole’s index 
is perhaps the outstanding example. In Poole’s case, of course, 
the index was centrally edited, and he used experienced catalo- 
gers and indexers, but lacked a list for vocabulary con'trol.^^ 



SUBJECT LISTS AND THESAUPI 

Current thesauri or vocabulary control lists seem to have 
an interesting difference from library subject headings and 
information file listings which I have not seen previously dis- 
cussed in the literature. Most of them, despite provisions for 
updating and for the addition of new terms are, in comparison 
with the library lists, actually classifications arranged in 
alphabetical order. 

Let me see if I can clarify this difference. In using a 
library classification scheme, despite its synthetic aspects, 
's^hat we are basically dealing with are ms of pre-established 
pigeon holes, and our task is to place our item in the most ap- 
propriate one. Vluen we use a library heading list, our approach 
is first to determine the subject of the work and then to use 
the list to regularize its expression or, where experience has 
shown this to be necessary or desirable, to classify the work 
in some way instead. But if the subject has not been given in 
the list and we are net told by analogy with the list to devi- 
ate from our general instruction to enter' under the subject, 
we create our own heading in the spirit of the list, add it to 
the list, and go on with our work. 

It is true that, for some b^/pes of entries and b}/ seme 
subject headers, this procedure is ignorantly more breached 
than observed. Rut in most ca.ses, the principle is clearly fol- 
lowed: a book about Man 6* War is not entered under Racehorses, 
nor a book about Mt. Washington under Mountains, nor even under 
Mountains - U.S., despite the fact that neither Man o' War nor 
Mt. Washington appear in the lists. 



VOCABULARY CONTROL OP VOCAB'JLARY LIMITATION? 

For various reasons - mostly, I suspect, technological 
in inception and only secondarily intellectualized - most 
thesauri 'would have the indexer place each concept he chooses 
for indexing imder an existing heading in the list - the thesau- 
rus of the American Petroleum Institute is the only exception 
to this known to me. 
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Placing each concept under an existinir heading, is an act 
of classification in that, as is the case with library classi- 
fication schemes, it means that the indexer must clCtoae an exis- 
ting pigeonhole from an array before hijTi; the only difference 
is that the arrangement of the array, and of the entries in the 
resulting index, is alphabetical rather than classified. 

This leads to an important point about the nature of lib- 
ra^ subject heading lists for vocabulary control, a point 
V7hich carries over to published indexes such as the Wilson 
indexes wMcli follow similar principles for vocabulary control,, 
It is clear that, at least for very large classes of subjects, 
their presence on the list is implicit, even if they are not 
actually printed in the list or have not been previously used 
in the index or catalog. 

Subjects of this kind include tlie names of persons and 
institutions, geographical names, particular species of birds 
or animals or fishes, Icinds of games - the list is literally 
endless, even if we do not include types of subjects which are 
sometimes (mistakenly, in my view) entered only under 
broader class headings by some catalogers and indexers: names 
of computer languages, names of particular chemical compounds, 
or names of particular devices or mchines. This is sanetimes 
justified in library practice by stating that direct entry 
should be made down to the level o;^ the spec5.es, but that vari- 
eties should be given class entr'/.'-^ This may be easy to see 
in biology, but becomes more difficult and less readily justi- 
fiable in other subjects. 

Note that in the library lists, or for the bulk of the 
indexing done in the fashion of, say, the Wilson indexes, con- 
trol of the form of expression of concepts not previously 
indexed or entered in vocabulary control lists is done either 
by rule (as in the case of using library' entry rules for nairies 
of persons and institutions, for example), or by analogy with 
previous indexing decis.ions recorded in the list or index, or 
by using another index or list or reference vTork as authority 
(Chemical AbstrHcts for names of compounds, for example, or 
a particular gazetteer for place nam,es). 

In the latter case, the authorities chosen become in fact 
extensions of the heading list or thesaurus, effectively exten- 
d^g the list for vocabulary control purposes by literally mil- 
lions of items without swelling its bulk.. Where individual 
decisions must be made for subjects, these may be added to the 
list at the time of first need and hecawe authority for the form 
of expression of future occurrences of the same concept when it 
agcuin appears as indexable matter. All entries are, of course, 
added to the index itself, where they roc'.y serve as authority 
just as if they were in a separate listing. 
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Notice that this methoi of vocabulary' control certainly 
does not preclude classed entry of the ccrcejt in the index in 
place of or in addi;^ion to (as usually seems preferable to me 
as it did to Kaiser*"^) subject entry proper, nor does it pre- 
vent fne use of the saine. list for the control of the vocabulary 
of expression of the classed concept. 

Class entry is not usual in library catalogs, v;ith certain 
well-established exceptions, such as alphabet ico-classed entry 
for historical topics. To some extent the lack of classed 
entry is made up for by the classed shelflist and classed array 
of entries on the shelves. This is not true of indexes to clas- 
sified abstractinR services, since the classification in this 
case is usually broad rather than narrow and is, in any case, 
not cumulated. It seimves instead to provide groiapinps which 
are readily scanned for current awareness purposes. 

In indexing, classed entries or classed arrays are fre- 
quently more desirable either to a’jgment subject indexing or 
to group some types of entries for special purposes, generally, 
based on common needs of users of the particular index or 
indexing service. 

Perhaps because of the concurrent use of shelf classifi- 
cation, library heading lists (although they include a syndetic 
apparatus which serves some of the same purposes) do not include 
the kind of classification of the headings themselves found in 
some thesauri expressed as "broader terms" and "narrower terms" 
or ''generic for" and "specific to". Cutter felt that a classi- 
fied listing of library headings vrould be very' useful, but was 
too difficult and expensive to maintain. The experience of cur- 
rent thesaurus builders may be helpful in answering these aues- 
tions. It vdll be interesting to note, too, how these listings 
will deal with topics whose class relationships, at least for 
an index of broad coverage, are not, as Hay'kin puts it, "obvious 
or comir.on:" irJc, for exarriple, or the VJhite House, 

The Sears list did and the Library of Congress list does 
include, hOT'ever, classification numbers from I^ey and Library 
of Congress classifications respectively'. These were intended 
primarily as scope notes or suggestions for classification, how- 
ever, not to provide a classification of the headings. In the 
case of the Library of Congress list, they were intended to be 
included when and only when, the class was oo-extensive with 
the heading, but this has been done only inconsistently. 

In the thesaiiri, it does not seem to me to be clear exactly 
hew this classed apparatus, as opposed to the syndetic apparatus 
carried over from library schemes, is intended to be used: that 
is, whether it is intended for the indexer - and if so, in what 
way - or for the user of indexes hased on the thesaurus. Where 
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indexes are maintained in machine-readable form, and where 
entries are automat icallv also posted tD the next upward step 
in the hierarchy on the machine-readable record (though not, 
for reasons of bulk, included in the printed indexes), it will 
be interesting to see the e^rtent and nature of use of this feature. 

In a sense, too, these classifications, in the thesauri, 
serve, in the list though not in the index, as a kind of upward 
cross-reference of the t’'/pe usually avoided in indexes. If 
users actually do employ the thesaurus as an aid in searching, 
as appears to be the intent of some of these lists, ai-^d as 
might be made necessary by the veirn/ extensive use of class en- 
tries, we may be able to test the effect of upward cross-refer- 
ences. Vhile upv^ard cross-references have been advocated by 
some cataloging experts? 6 the\' seem to propose selective and 
judicious, rather than overall, use of such a feature. 

A separate heading list or thesaurus is not, of course, 
required to achieve exactly the type of vocabulary control as- 
sociated with listings. Heading lists grew from the common 
indexing practice of achieving regualrization of expression 
and guidance in the choice of indexable matter by consulting 
the previous indexing used in the same index or indexing ser- 
vice, or by consulting other indexes upon which their ox»m may 
be modeled, as catalogers frequently consult the Wilson indexes 
for form of ejcpression for new subjects. Provided additional 
apparatus such as a record of re\«rse cross references and 
scope notes is provided, wathin or outside the index, the result 
is the same as with a separate subject authority; listing, 
though sometimes less convenient to use. The use of the index 
itself, as guidance to interpretations of the list, for example, 
is usually an essential aid to the indexer even when a thesaurus 
or list is maintained. 

The separate vocabulary control listing is desirable for 
convenience, as an aid to starting a nei.; index, as a device to 
standardize form of expression across indexes (even indexes 
with entirely different interpretations of indexable matter), 
as an aid in creating local entries which vdll fit v;ith those 
issued by a centralized service, and as a place to record de- 
cisions or control apparatus (reverse cross-references, scope 
notes, etc.) which would s;-7ell,the bulk of the index itself 
without aiding the user. A separate authority list, too, is 
easier to edit and serves as an easier-to-use record of chances 
in entry form. 

If we seek to make a genercil assessment , however premature 
this may be, of the new thesauri, and to judge them on the 
basis of their success in use, it becomes evident that at least 
some of those most discussed have never been used on any signi- 
ficant scale. Those which seem to me to be the most successful 
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are those v;hich have been based, in theii^ form of expt'ession 
of terms and the choice of terms to be included, upon actual 
indexing; practice and e>q)erience as well as upon the useful ad- 
vice of subject experts. 

Of the larger and more ambitious listings, nearly all of 
the most successful have been based on substantial library or 
indexing practice or substantially refined <after actual use: 
Medical Subject Headings (MeSH), the Bureau of Ships Thesaurus, 
the Anerican Petroleum Institute Thesaurus and the National 
Aeronautics and Space Administration Thesauhrs. Some of these, 
particularly the ^!ASA Thesaurus, seem, to have gone beyond the 
library lists in important respects: the pro'/ision of a classed 
listing or a listing by broad categories; of a separate list 
of subjects to be used; of indications of the broader and nar- 
rower class relationships of topics separate irom. the syndetic 
apparatus proper; of permuted listings of the larms and of dif- 
ferent and clearer control terminology (*use for' , and 'refer 
from' , for instance - although the latter was formerly used in 
the Sears list). 

In other respects, they seem still to lack jjirportant fea- 
tures to be found in the library'' listings, or their related 
apparatus. Tliere are few clear, or at any rate p\iblished, 
explanations of the my in which indexable matter is to be 
selected. The discussions of this 5.n Hayidn and in Sears are 
cert:ainly not completely unambiguous, any more than the defini- 
tion used by Chemical Abstracts indexers, but they do seem 
reasonably functional. Perhaps the most outstanding lack in 
the thesauri is that of adequate provision for subject subdivi- 
sion. I may be jumping to conclusions too early, but it is 
probably safe to say that the indiscriminate use of roles and 
linlcs is dead; that their effective use in future indexes will 
be selective, more infrequent than has previously been advocated, 
and designed for machine use, not publication. 

It is evident that much research needs to be done before 
vocabulary control in indexing can become anything resembling 
an exact science. Since indexing vocabulciries are inev.itably 
linguistic in nature - and this applies even to classification 
schemes - and since the material to be indexed is expressed in 
lan.fTuage, it seems impossible, in the same sense that machine 
translation is impossible, that there vjill ever be an exact 
and rigorous mecms for carrying out the total task. This makes 
it all the more isrportant to develop exact and rigorous methods 
which contribute to useful indexing in whatever areas this is 
possible, so as to limit those areas in which separate jud^pnents 
for each item are required. 

Much of current indexing research which has to do with vo- 
cabulary control seems to have been done ab ovo, without regard 
to the codified record of the information we have gained empiri- 
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cally through years of experience. Because this record has 
been more carefully codified, and tested over a loneer period 
of time, the library experience in subject headins work is pro- 
bably that which has the most to contribute. It seems to have 
been largely ignored because it is primarily used for a rela- 
tively shallcr,; form, of indexing in the definition of the scope 
of indexable matter. 



TENTATIVE CONCLUSIONS 

It would seem useful, then, to offer a group of tentative 
conclusions about vocabuJar^^ control based largely on that 
experience, but considered and presented in the context of 
indexing the literatxrre of litrarianship. 

1) Previous indexing, provided an adequate record of 
sXTidetics is maintained, can constitute as rigorous a control 
of vocabulary as any listing especially designed for tbat 
purpose, although it may be less convenient to use. 

2) Limitation of the size of vocabularies . may be required 
for technical reasons. If this is the case, however, the 
reasons for the limitation, and its nature, must be clear' to 
the indHxer and to the interested and concerned user. 

3) While it is useful to have expert assistance in de- 
fining the scope of particular terms, or in suggesting terms 
for inclusion in a list, this can constitute only a strength- 
ening of, not a substitute for, building a vocabulary from mia- 
terial like that to be indexed, fitting the terminology into 

a particular indexing structure. 

4) Vocabulary requirements, particularly the questions of 
class entr^.j’, of class and subject or aspect subdivision or 
modification, and the nature of the reference, vary widely 
with the estimated size of the index, whether it is designed 
to be cumulated, or not, and whether or not it is to be period- 
ically closed and post-edited or not. 

5) Experience would appear to show that, at least in the 
current state of the art, indexing for a discipline like lib- 
rarianship to provide for both current and retrospective search 
cannot achieve sufficient vocabulary control to be as useful 

as the best of existin.g tools ty means of citation indexing 
or title keyword indexing, although the former may be a useful 
adj’jnct for certain types of seardhes in depth, and the latter 
may be a useful current awareness tool, particularly in indexes 
of relatively small size. It would seem premature to invest in 
either before we have adequate investment in current, on going 
indexes. 
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6) While post-coordination of indexing texTns nay be a very 
valuable supplenent to published conventiona]. subject indexes, 
their vocabulary reouirenents appear more and more to be the 
same as those of more traditional forms of indexes. 

7) Author, editor, or other source indexinp:, other than 

by encourap'ement of the use of more explicit titles and prreater 
bibliographic regularization for use in title keyword indexes 
for current awareness purposes, does not appear to offer a 
practical solution to the problem of producing an index to the 
iiteratinre of a discipline. 

B) Lacking a far greater knowledge of the indexinp process 
than we have at present, decentralized indexing, with or with- 
out a thesaurus ccntrcl device , for use in smaller journal or 
report indexes and then later centralized cumulation into an 
index to the literature of a discipline, does not appear 
practicable. The opposite procedure - centralized indexing 
for an index to the literatrure of a discipline, producing index 
entries which can also be used, for example, to provide indexes 
for individual journals, appears more likely to be practicable 
at present, although it would require careful design and con- 
siderable experimentation. 

9) Large indexes designed for cumulation and retrospective 
search are not pi-acticable without some form of subject sub- 
divisions or modifiers to produce useful subarrangement of the 
material , 

10) It seems possible to make a number of specific state- 
ments about the actual forrr'. of expression of indexing tetrris: 

a) Entrp/ terms should be in the form of expression 
most likely to be known to tlie user, with reference 
from other forms: ASLIB rather than Association of 
Special Libraries and Information Bureaux; COBOL, 
rather than Common Bt'sincess Oriented Language. 

b) In general, entry forms should be in the plural 
rather than in the singular, and word.-by-word filing 
should be used. Vfnile some very successful large 
indexes use the singular and letter-by- letter, they 
all require more filing modifications than might 
otherwise be iecessary. The singular is often used 
because it se ins to produce , expecially with letter- 
by-letter fil3r.g v.dthin the subject proper, more 
classed groupings than entry under the plural. The 
plural form, however, is not only a more natural way 
of expressing a topic or subject (a work refers to 
canputers, not to oomputer), but permits useful dis- 
crimination in those cases where the singular connotes 
the general and the plural particvfer aspects of it 
(Engraving, Engravings). 
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c) Homotrraphs should be c!ua3.ified by an expression 
indicating what is meant; placinp the expression in 
parentheses is a widespread means of doing this, and 
might be accepted as standard. 

d) Many concepts cannot be expressed except as 
phrases, and should be so expressed. 

e) Headings compesed of adjective and noun should not 
usually !■« inverted, partici.ilarly if the purpose of 
the inversion is only to achieve a classed arrange- 
ment of the entries. 

f) Subdivisions of or within a subject intended to 
constitute conceptual and, therefore, arranging breaks, 
should be clearly indicated by pu' ctuation, typography, 
or spacing. 

g) Subjects may be subdivided by facets, aspects, 
generalized classed groupings (applicable to a range 
of similar subjects), ad hoc classed grouping, or by 
tailored modifiers, like those in Chemical /Abstracts 
.subject indexes. Subdivision should take into account 
the number of entries likely under a topic, display 

or lack of it; and the usefulness of the division methods 
chosen, as well as problems of cumulation. It might be 
pointed out here that the library lists are quite sophis- 
ticated, providing subdivisions which mav be used with 
any reading subdivisions printed once which may be used 
with any of certain classes of heading, and divide like 
instructions. This is far more sophisticated than 
such devices as roles, otiigatory sub- faceting in some 
proposed facetipg schemes, and is an area which deserves 
and r«ouires further research. 

Again, these observations must appear commonplace to an 
audience of librarians, but many of us seem attracted to more 
sophisticated syst'^ms, or machine-based systems, where other 
aspects of indexing may he better performed than in conventional 
indexing, but where we are accepting advice frcm less experienced 
or qualified people in questions of vocabulary control, often, 
again, erroneousl}/ believing that the advice reflects equipment 
requirements. 

It seems unquestionably true that newer techniaues, primar- 
ily the use of machine readable copy, computer-based production 
of indexes, and the use of computer facilities for special-pur- 
pose searches and biblio,grapMes can lower unit costs, ir/prove 
speed of production of indexes, and provide for greater flexibility 
in the indexing process. 

In seeking to extgid control over the literature of libra- 
rianship and documentation, ho^^ever, we should keep in mind that 
we need a flexible tool for retrospective searching, and that 
more rapid indexing for the major existing indexing service 
would either enable concurrent production of a current awareness 
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service at a reasoaable cost, or substitute for it. 

Further indexinp: vocabulary analysis wuld be desirable 
for library literature, unauestionably , but it would se^ sc^ 
to base this on Libr^ Literature, the only s^stanti^ 

5n the field, and the only index with an established, compara 
tivelv sophisticated vocabulary control, and with the only 
of headings used for substantial airounts of literature in 
field. The maior vocabularv problems now seem to be those or 
speed of intro^ction of ne-J terms, the nature of class headings, 
and the uncertain terminology of the field. 

In the interest of increasing the access to our literatxjre, 
then, it would seem most reasonable to build upon e^sp^ng 
strength particularly when v?e consider the present high quality 
level achieved with such a small staff. I would pro^se aid 
to the Wilson Conpany in exp3.oring nev7 production methods, and 
for research in indexing '/ocabularies procedures, as well as 
urging sunport for expanding the staff, depth, _ and scope of Th®^ 
index to be Library and Information Science Literature, through 
subsidy if necessary, to malce it a model index. 
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